Skip to content

PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader#1330

Merged
wgtmac merged 1 commit into
apache:masterfrom
parthchandra:master
Apr 29, 2024
Merged

PARQUET-2171: (followup) add read metrics and hadoop conf integration for vector io reader#1330
wgtmac merged 1 commit into
apache:masterfrom
parthchandra:master

Conversation

@parthchandra

Copy link
Copy Markdown
Contributor

This is a followup with minor fixes/additions for the vector io based file reader

Jira

  • PARQUET-2171 : support hadoop vector io

Tests

  • Existing tests are sufficient

Documentation

Existing documentation is sufficient

@parthchandra

Copy link
Copy Markdown
Contributor Author

@wgtmac, @steveloughran Some minor additions to the vector io based file reader. Adds the read metrics added in the serial reader path. Also adds the default construction in read options to read the hadoop conf for the vector io setting.
Please take a look.

@wgtmac wgtmac merged commit 337d082 into apache:master Apr 29, 2024
@parthchandra

Copy link
Copy Markdown
Contributor Author

Thank you @wgtmac !

@steveloughran

Copy link
Copy Markdown
Contributor

looks great. If there's another 14.0 RC, will this go in to it?

Note we create lots and lots of IOstatistics, for vector reads we include #of bytes read and discarded along with all the other timings. My WiP to make that accessible via reflection will help, but it'd still need work in parquet to aggregate.
apache/hadoop#6686
you can have all the stats as a piece of JSON if that helps, then parquet lib just has its own copy of the stats class to parse it...

@wgtmac

wgtmac commented May 6, 2024

Copy link
Copy Markdown
Member

I think this is already included in the 1.14.0 RC0/RC1

clairemcginty pushed a commit to clairemcginty/parquet-mr that referenced this pull request May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants